Search | WHO COVID-19 Research Database

Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine Misinformation

He, B.; Ahamad, M.; Kumar, S..

ACM Web Conference 2023 - Proceedings of the World Wide Web Conference, WWW 2023 ; : 2698-2709, 2023.

Article in English | Scopus | ID: covidwho-20236655

ABSTRACT

The spread of online misinformation threatens public health, democracy, and the broader society. While professional fact-checkers form the first line of defense by fact-checking popular false claims, they do not engage directly in conversations with misinformation spreaders. On the other hand, non-expert ordinary users act as eyes-on-the-ground who proactively counter misinformation - recent research has shown that 96% counter-misinformation responses are made by ordinary users. However, research also found that 2/3 times, these responses are rude and lack evidence. This work seeks to create a counter-misinformation response generation model to empower users to effectively correct misinformation. This objective is challenging due to the absence of datasets containing ground-truth of ideal counter-misinformation responses, and the lack of models that can generate responses backed by communication theories. In this work, we create two novel datasets of misinformation and counter-misinformation response pairs from in-the-wild social media and crowdsourcing from college-educated students. We annotate the collected data to distinguish poor from ideal responses that are factual, polite, and refute misinformation. We propose MisinfoCorrect, a reinforcement learning-based framework that learns to generate counter-misinformation responses for an input misinformation post. The model rewards the generator to increase the politeness, factuality, and refutation attitude while retaining text fluency and relevancy. Quantitative and qualitative evaluation shows that our model outperforms several baselines by generating high-quality counter-responses. This work illustrates the promise of generative text models for social good - here, to help create a safe and reliable information ecosystem. The code and data is accessible on https://github.com/claws-lab/MisinfoCorrect. © 2023 Owner/Author.

Synonym-based Text Generation in Restructuring Imbalanced Dataset for Deep Learning Models

Ningsih, F. S. S.; Khotimah, P. H.; Arisal, A.; Rozie, A. F.; Munandar, D.; Riswantini, D.; Nugraheni, E.; Suwarningsih, W.; Kurniasari, D..

5th International Conference on Networking, Information Systems and Security, NISS 2022 ; 2022.

Article in English | Scopus | ID: covidwho-2300967

ABSTRACT

One of which machine learning data processing problems is imbalanced classes. Imbalanced classes could potentially cause bias towards the majority classes due to the nature of machine learning algorithms that presume that the object cardinality in classes is around similar number. Oversampling or generating new objects in minority class are common approaches for balancing the dataset. In text oversampling method, semantic meaning loses often occur when deep learning algorithms are used. We propose synonym-based text generation for restructuring the imbalanced COVID-19 online-news dataset. Three deep learning models (MLP, CNN, and LSTM) using TF/IDF and word embedding (WE) feature are tested with the original and balanced dataset. The results indicate that the balance condition of the dataset and the use of text representative features affect the performance of the deep learning model. Using balanced data and deep learning models with WE greatly affect the classification significantly higher performances as high as 4%, 5%, and 6% in accuracy, precision, recall, and f1-score, respectively. © 2022 IEEE.

Linguistic Resources Construction for Food and Beverages in Thai Texts for Product Storytelling Text Generation

Innurak, P.; Tongtep, N..

6th International Conference on Information Technology, InCIT 2022 ; : 434-439, 2022.

Article in English | Scopus | ID: covidwho-2296895

ABSTRACT

Due to the impact of COVID-19, most people have to quit their jobs. As a result of the epidemic, many people are turning to open online stores on e-commerce platforms, and people new to this market are inexperienced in competing with other stores. One of the critical points that can help them is to present their stories through merchandising. The presentation of a story through sales can be like telling a story about their store or the product. However, writing a good story is not easy. So, this research aims to help the sellers by constructing linguistic resources that sellers can use for writing a story about their products. In this paper, 3378 token product description texts were collected and ranked by the Text Ranking method to determine the frequency of frequently-used words. The 1,853 words from the Text Ranking were applied for text modeling using Latent Dirichlet Allocation (LDA). The result shows that these words can be categorized into seven food and beveragerelated topics. Moreover, sellers can use these words for writing product descriptions. © 2022 IEEE.

Automatic Rephrasing of Transcripts-based Action Items

Cohen, A. D. N.; Kantor, A.; Hilleli, S.; Kolman, E..

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 ; : 2862-2873, 2021.

Article in English | Scopus | ID: covidwho-1678733

ABSTRACT

The automated transcription of spoken language, and meetings, in particular, is becoming more widespread as automatic speech recognition systems are becoming more accurate. This trend has significantly accelerated since the outbreak of the COVID-19 pandemic, which led to a major increase in the number of online meetings. However, the transcription of spoken language has not received much attention from the NLP community compared to documents and other forms of written language. In this paper, we study a variation of the summarization problem over the transcription of spoken language: given a transcribed meeting, and an action item (i.e., a commitment or request to perform a task), our goal is to generate a coherent and self-contained rephrasing of the action item. To this end, we compiled a novel dataset of annotated meeting transcripts, including human rephrasing of action items. We use state-of-the-art supervised text generation techniques and establish a strong baseline based on BART and UniLM (two pretrained transformer models). Due to the nature of natural speech, language is often broken and incomplete and the task is shown to be harder than an analogous task over email data. Particularly, we show that the baseline models can be greatly improved once models are provided with additional information. We compare two approaches: one incorporating features extracted by coreference-resolution. Additional annotations are used to train an auxiliary model to detect the relevant context in the text. Based on the systematic human evaluation, our best models exhibit near-human-level rephrasing capability on a constrained subset of the problem. © 2021 Association for Computational Linguistics

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL